Similarity measures for binary and numerical data: a survey

نویسندگان

Marie-Jeanne Lesot

Maria Rifqi

Hamid Benhadda

چکیده

Similarity measures aim at quantifying the extent to which objects resemble each other. Many techniques in data mining, data analysis or information retrieval require a similarity measure, and selecting an appropriate measure for a given problem is a difficult task. In this paper, the diverse forms similarity measures can take are examined, as well as their relationships and respective properties. Their semantic differences are highlighted and numerical tools to quantify these differences are proposed, considering several points of view and including global and local comparisons, order-based and value-based comparisons, and mathematical properties such as derivability. The paper studies similarity measures for two types of data: binary and numerical data, i.e., set data represented by the presence or absence of characteristics and data represented by real vectors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

A Survey of Binary Similarity and Distance Measures

The binary feature vector is one of the most common representations of patterns and measuring similarity and distance measures play a critical role in many problems such as clustering, classification, etc. Ever since Jaccard proposed a similarity measure to classify ecological species in 1901, numerous binary similarity and distance measures have been proposed in various fields. Applying approp...

متن کامل

Order-Based Equivalence Degrees for Similarity and Distance Measures

In order to help to choose similarity or distance measures for information retrieval systems, we compare the orders these measures induce and quantify their agreement by a degree of equivalence. We both consider measures dedicated to binary and numerical data, carrying out experiments both on artificial and real data sets, and identifying equivalent as well as quasi-equivalent measures that can...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

INFORMATION MEASURES BASED TOPSIS METHOD FOR MULTICRITERIA DECISION MAKING PROBLEM IN INTUITIONISTIC FUZZY ENVIRONMENT

In the fuzzy set theory, information measures play a paramount role in several areas such as decision making, pattern recognition etc. In this paper, similarity measure based on cosine function and entropy measures based on logarithmic function for IFSs are proposed. Comparisons of proposed similarity and entropy measures with the existing ones are listed. Numerical results limpidly betoken th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJKESDP

دوره 1 شماره

صفحات -

تاریخ انتشار 2009

Similarity measures for binary and numerical data: a survey

نویسندگان

چکیده

منابع مشابه

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

A Survey of Binary Similarity and Distance Measures

Order-Based Equivalence Degrees for Similarity and Distance Measures

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

INFORMATION MEASURES BASED TOPSIS METHOD FOR MULTICRITERIA DECISION MAKING PROBLEM IN INTUITIONISTIC FUZZY ENVIRONMENT

عنوان ژورنال:

اشتراک گذاری